Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make storage upload have no chunk size by default. #606

Closed
wants to merge 1 commit into from

Conversation

dhermes
Copy link
Contributor

@dhermes dhermes commented Feb 10, 2015

Fixes #546.

@tseaver @thobrla

I took a stab at the issue but am still unsure of a few things:

  • Does apitools actually attempt a full upload when chunksize=None? (/cc @craigcitro)
  • Would we be better with a boolean like use_chunked=False and then just use the chunk size on the Blob?
  • Is having a separate keyword unintuitive?

Also @thobrla I'm curious, RE: "We should allow the user to override the chunk size". Do we need a default chunk size for downloads? In the case that someone creates an object with a custom chunk size, should we use it in this method or still try the "all in one go" approach?

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 10, 2015
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling afc9994 on dhermes:fix-546 into b395360 on GoogleCloudPlatform:master.

@thobrla
Copy link

thobrla commented Feb 10, 2015

It hasn't been merged into the mainline yet, but I implemented a chunksize=None approach for uploads in gsutil's fork of apitools: https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/third_party/storage_apitools/transfer.py#L773

However, because you're using httplib2, there are a number of other issues that you'll need to solve to get this to work. https://github.com/GoogleCloudPlatform/gsutil/blob/master/gslib/gcs_json_media.py has some examples.

@thobrla
Copy link

thobrla commented Feb 10, 2015

As for chunk size in general, the ideal thing is not to use chunks for uploads or downloads at all unless you have a specific reason to do so (for example, buffering a streaming transfer).

@tseaver
Copy link
Contributor

tseaver commented Feb 10, 2015

Rather than passing in an upload_chunk_size, wouldn't it be more sensible to have Blob.CHUNK_SIZE be None as the appropriate default, but settable for cases where the application wanted to do chunking (maybe via a subclass?).

@dhermes
Copy link
Contributor Author

dhermes commented Feb 12, 2015

@tseaver That's why I asked "Do we need a default chunk size for downloads?".

From @thobrla answer above it seems that a default is not needed.

I'm still unclear if passing chunksize=None to our vendored in code is sufficient to fix #546.

@tseaver
Copy link
Contributor

tseaver commented Feb 12, 2015

Are we leaving the blob's own CHUNK_SIZE around, but just using it for downloads? Should we be making it an explicit parameter there, too?

@dhermes
Copy link
Contributor Author

dhermes commented Feb 12, 2015

I plan on implementing the default chunk_size=None and then just using blob.chunk_size the way the class constant was used.

I just wanted to make sure passing chunksize=None to our vendored in code is sufficient before moving too much further.

@craigcitro
Copy link
Contributor

so i've finally got myself an official repo for apitools; i'm planning on finally doing a round-robin merge to get all the copies back in sync. in particular:

  • if you're waiting on a fix that's only in the gsutil branch, let me know, and
  • we can also hopefully get you out of the business of having to keep your own copy of apitools code.

@dhermes
Copy link
Contributor Author

dhermes commented Feb 12, 2015

@craigcitro w00t! Thanks for the news.

The main question is

Does apitools actually attempt a full upload when chunksize=None?

which @thobrla indicates is only in upstream (gsutil).

@dhermes dhermes added the api: storage Issues related to the Cloud Storage API. label Feb 14, 2015
@dhermes
Copy link
Contributor Author

dhermes commented Feb 18, 2015

Ping @craigcitro

Does apitools actually attempt a full upload when chunksize=None?

@craigcitro
Copy link
Contributor

sorry, i missed this question earlier in the thread. i believe the code you're looking for is still upstream in gsutil, but i'm working on getting the various versions of apitools unified.

@dhermes
Copy link
Contributor Author

dhermes commented Feb 19, 2015

@craigcitro Do you need help on the unification?

@dhermes
Copy link
Contributor Author

dhermes commented Mar 2, 2015

UPDATE: Chatting with @craigcitro today (March 2, 2015) it seems the gsutil upstream changes have been merged into https://github.com/google/apitools

@tseaver Do you want to vendor in the latest updates from apitools and then I'll rebase? (I could vendor in too, but presume you could do it faster since you did it the first 2 times.)

@tseaver
Copy link
Contributor

tseaver commented Mar 2, 2015

Yeah, I'll vendor it in.

@tseaver tseaver closed this Mar 2, 2015
@tseaver tseaver reopened this Mar 2, 2015
@tseaver
Copy link
Contributor

tseaver commented Mar 3, 2015

@craigcitro I've done the re-vendoring, but now lack context to get the last set of tests passing:

======================================================================
ERROR: test_upload_from_file_resumable (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 418, in test_upload_from_file_resumable
    blob.upload_from_file(fh, rewind=True)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/blob.py", line 354, in upload_from_file
    finish_callback=lambda *args: None)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/_gcloud_vendor/apitools/base/py/transfer.py", line 804, in StreamInChunks
    additional_headers=additional_headers)
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/_gcloud_vendor/apitools/base/py/transfer.py", line 764, in __StreamMedia
    '%d' % self.progress)
CommunicationError: Failed to transfer all bytes in chunk, upload paused at byte 4

======================================================================
FAIL: test_download_as_string (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 328, in test_download_as_string
    self.assertEqual(fetched, b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef


======================================================================
FAIL: test_download_to_file (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 272, in test_download_to_file
    self.assertEqual(fh.getvalue(), b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef


======================================================================
FAIL: test_download_to_filename (gcloud.storage.test_blob.Test_Blob)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tseaver/projects/agendaless/Google/src/gcloud-python/gcloud/storage/test_blob.py", line 307, in test_download_to_filename
    self.assertEqual(wrote, b'abcdef')
AssertionError: 'abc' != 'abcdef'
- abc
+ abcdef


----------------------------------------------------------------------

It looks as though code which deals with explicitly-chunked stuff from the server side is now broken.

@craigcitro
Copy link
Contributor

is the easiest way for me to play with it just to clone your branch?

@tseaver
Copy link
Contributor

tseaver commented Mar 3, 2015

Likely so: I don't want to make it a PR here until things pass. Just run tox -e py27 to see it fail.

@dhermes
Copy link
Contributor Author

dhermes commented Mar 23, 2015

@tseaver @craigcitro I'm happy to take up the vendor-ing in process of HEAD in apitools. What is required? @tseaver What stopped you before?

@craigcitro
Copy link
Contributor

i think he was hitting test failures on upload/download; i'm intending to try it, but i've been believing that for 3+ weeks. 😉

@dhermes
Copy link
Contributor Author

dhermes commented Mar 23, 2015

I can babysit you if you like.

@craigcitro
Copy link
Contributor

my issue is that it hasn't hit the top of my todo list -- currently heads-down on something else completely.

gsutil is currently using the version of apitools at master, so i'm reasonably confident in that code. i just need to drop the new version in and see why it's failing for veneer.

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

Cool

@tseaver
Copy link
Contributor

tseaver commented Mar 24, 2015

#755 will lead to de-vendoring apitools, which will then require us to get our tests compatible with the new semantics (causing the failures above).

Once that lands, we can re-visit this one.

@dhermes
Copy link
Contributor Author

dhermes commented Mar 24, 2015

I agree. After hacking on #754 and a late night chat with Craig, it seems that is the best way forward.

@tseaver
Copy link
Contributor

tseaver commented Apr 9, 2015

@dhermes seems like merge of #811 requires revisiting / rebasing this PR?

@dhermes
Copy link
Contributor Author

dhermes commented Apr 9, 2015

Yes this PR is the one that made us realize #811 was "necessary" (more like was an option).

@dhermes
Copy link
Contributor Author

dhermes commented Apr 9, 2015

Closing this PR and going to take the new approach of just making CHUNK_SIZE optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

resumable uploads should try to send the file in one chunk by default
6 participants